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1.0 INTRODUCTION 


NASA has embarked on a program to increase the effectiveness and 
efficiency of the system that couples the user of space data with the sen- 
sors that acquire this data. This program, the NASA End-to-End Data System 
(NEEDS), addresses the identification, development, and demonstration of 
data handling techniques and technologies which are required to accomplish 
these goals. 

More specifically, the NEEDS program goals present a requirement for 
on-board signal processing to achieve user-compatible, information-adaptive 
data acquisition. These signal processing functions comprise a major con- 
stituent of the Information Adaptive System (IAS), a significant module of 
the NEEDS concept. The IAS essentially consists of the spaceborne portion 
of NEEDS exclusive of telemetry, support, and housekeeping functions. 

This volume addresses the impact of data set selection on data format- 
ting required for efficient telemetering of the acquired sensor data. More 
specifically, the FILE algorithm developed by Martin-Marietta[l] provides a 
means for the determination of those pixels for which the earth's surface 
is obscured by clouds. Subsequent deletion of these pixels from the data 
stream effects an improvement in the achievable system throughput. This is 
necessary in that future sensors requiring throughput of 370 Mb/sec (i.e., 
the MLA aboard the OEOS) will use the TDRSS as their telemetry vehicle with 
a capacity of only 120 Mb/ sec. 

It will be seen that based on the lack of statistical stationarity in 
cloud cover, spatial distribution periods exist where data acquisition 
rates exceed the throughput capacity. The study therefore addresses 
various approaches to data compression and truncation as applicable to this 
sensor mission. Two new and novel approaches will be posed: that of 

band-to-band DPCM and that of dynamic companding. The volume concludes 
with a recommendation for further study. 



2.0 DATA SET SELECTION - FILE[1] 


The data set selection algorithm addressed by this study is the 
Feature Identification and Location Experiment (FILE) being developed for 
NASA (LaRC) under contract by Martin-Marietta. The goal of this program is 
to test a technique using spectral -radiance ratio detection autonomously 
for classifying picture elements in a solid-state camera image of the 
earth. Initially in the program, pixel classification focused on four main 
groups: vegetation, bare land, water, and clouds or snow/ice. Previous 

studies had shown that these four categories can be identified by radiance 
measurements at two discrete wavelengths: 0.65 and 0.85 vun. Figure 2-1 

shows how the various categories can be separated by radiance ratioing at 
these two wavelengths. Water and vegetation can be separated from clouds, 
snow (ice), and bare land on the basis of the IR/red ratio alone. Bare 
land can be separated from clouds and snow on the basis of overall radiance 
level with approximate knowledge of solar illumination angle. Figure 2-2 
shows how the voltages from the imaging camera are used to provide this 
feature classification. The 99 percent confidence polygons are based on a 
computer model considering such parameters as visibility, illumination 
angle, sensor noise, pixel uniformity and dark current, viewing angle, 
variations in bare land and vegetation types, and some variation in water 
turbidity. 

More recent efforts by Martin-Marietta have addressed a cloud detector 
capable of discriminating between clouds and snow/ice. It is known that 
both clouds and snow/ice have a high reflectance in the visible spectrum. 
However, at 1.55 pm only clouds maintain a high reflectance. The addition 
of this band forms the basic ingredient of the "FILE II" algorithm to which 
the present study is addressed. 
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3.0 CLOUD COVER STATISTICS 


To take advantage of the FILE algorithm's ability to determine cloud 
pixels, the data in cloud-free regions must be buffered until cloud pixels 
occur. At that time the buffer is dumped into the data steam and effec- 
tively compression occurs. In order to assess the size of an on-board 
memory to buffer the cloud-free data until cloud cover is encountered, it 
became necessary to examine the homogeneity of global cloud cover. 

Various research studies indicate the global cloud cover to stay 
reasonably constant at about 40%[2]. If this were to demonstrate statisti- 
cal stationarity , then bandwidth compression based on the editing of cloud 
pixels would be a highly feasible technique. There are, however, reports 
of large areas of the earth's surface with virtually no cloud cover. 

For example. Figure 3-l[3] presents frequency of cloud cover over three 
sample areas. Observe that during winter (1800 LST) 80% sky cover exists 
only 5% of the time, and less than 30% sky cover exists 70% of the time. 
From this type of data one could derive local estimates of the probability 
of cloud detection. However,- this probability is not the appropriate 
statistic for buffer memory size determination. The required statistic is 
the distribution of distance between cloud pixels. Such data are likely to 
be Poisson distributed along-scan and behave in a Markov fashion from 
scan-to-scan. Numerous studies, in addition to the above references, have 
studied such cloud characteristics such as "viewabil ity" of a region and 
the required number of looks to be certain of obtaining coverage. However, 
none were discovered which specifically addressed the statistics of 
distance-between-cloud-pixels. The data cannot easily be generated from 
image data in that extensive correlation with ground truth cloud identified 
pixels is required (also, it is doubtful that this ground truth data is 
readily available). However, by examination of the tabular data in [3], 
one can conclude that, quite often, extremely large areas of interest are 
almost totally cloud-free. Obviously, a buffer to meet this worst-case 
need is not reasonable. It is suggested that a nominal size buffer memory 
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Figure 3-1. Cloud cover distributions demonstrating regional homogeneity 
for Region 1. [3] 
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be employed and that the data be further encoded or truncated to achieve 
compression. The combination of encoding and cloud cover elimination 
should work together very effectively, since those areas of the world which 
are low in cloud cover, e.g., deserts, also tend to be low in entropy. 

Novel approaches to achieve data compression are discussed in the following 
sections. 
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4.0 DATA COMPRESSION 


4.1 Introduction 

This section briefly presents well-known techniques of data compres- 
sion as a setting for the introduction of what is considered to be a novel 
approach to compression: namely, band-to-band Differential Pulse Code 

Modulation. The theory of coding to achieve data compression is well 
documented in the literature ([4] and [5] are two excellent survey papers) 
and only the salient points will be addressed here. 

Information coding may be divided into two general categories: namely 

information preserving and other codes which introduce distortions which 
are not usually obvious to the observer. The former ranoves statistical 
redundancy while the latter removes psycho-physical redundancy. Statisti- 
cal redundancy removal results in compression ratios of usually less than a 
factor of two or three as exemplified by the Huffman and Shannon-Fano 
codes. Other information preserving codes of interest include Pulse Code 
Modulation (PCM) and predictive codes such as Differential PCM. As alluded 
to earlier these will be discussed in further detail in later paragraphs. 
Codes which remove psychophysical redundancy at the expense of some degree 
of distortion include transform codes such as Karhunen-Loeve, Hadamard, 
etc. These codes can achieve compression ratios on the order of several 
orders of magnitude in applications where only gross interpretation of 
pictorial data by a human observer is required. 

Table 4-1 indicates a "tree" diagram of coding techniques popularly 
used in image transmission. 

4.2 Pulse Code Modulation (PCM) 

PCM is a straightforward representation of a signal by a time dis- 
crete, amplitude discrete sequence of values. As shown in Figure 4-1 the 
waveform is sampled (usually at the Nyquist rate) and each sample quantized 
using some prespecified number of levels. A level is then represented by a 
binary word of length W such that the number of levels equals 2^ (e.g.. 
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Figure 4-1. PCM Encoding: (a) components of a PCM encoder, 
(b) Four-bit binary representation of amplitude 
levels between 0 to 15. [4] 
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Table 4-1. Image transmission coding techniques. 
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an 8-bit word is used to represent 256 levels). PCM is not a redundancy 
removing code and is therefore often used as a baseline code for comparing 
other schanes. 

4.3 Predictive Techniques[5] 

Imagery in particular has both temporal and spatial redundancy from 
one sample to the next. As each sample is transmitted advantage can be 
taken of the fact that previously transmitted samples may contain informa- 
tion about it. Consider a sequence |yp[ and suppose the information in 
samples up to n=k-l has already been transmitted by some means. One can 
formulate a prediction of uk, uk» these k-1 samples and only transmit 
the difference sequence defined by 

A 

^k = uh - vh‘ 

If is the quantized value of k> Uk the reproduced value of Uk is 
given by 

* -* * 

Pk- = wk + "k* 

4.4 Differential Pulse Code Modulation (DPCM)[5] 

A widely used predictive technique for data transmission called DPCM 
is as shown in Figure 4-2. It is easy to deduce that the error in repro- 
duction of Uk is given by 

8yk = Uk - Mk = 6k - 6k = qk 

and is equal to the error in quantization of ek. To minimize the 
variance of the prediction error, yk should be the conditional mean 



★ 

where Uj^ is the set of past reproduced values, i.e., 

Uk = {•'I. « < h}. 
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Figure 4-2. DPCM components. 
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The mean square distortion of is given by 
and the minimum rate is 


'DPCM 


= 1/2 1092 


A|lkl\ 

Vq(k)/ 


p 

where is the prediction error variance. If conventional PCM were 

used, the minimum rate for y|< would be 


'PCM 


= 1/2 1092 


(zM\ 

U(k)j 


for the same quantizing distortion <^q(k). The compression achievable, 

■ ’'dPCM " 


PCM 


U(k)j 


is seen to depend on the variance reduction by prediction (i.e. , the 
ability to predict uk and therefore on intersample dependence of the 
sequence jy^}* If all the samples are independent then y^ = E[ykl and 
*^y(k) =<^e(k)> resulting in no advantage over PCM. The underlying philoso 
phy of prediction quantization is to remove mutual redundancy between 
successive samples and quantize only the new information, i.e., the 
residuals. An important aspect of DPCM is that the prediction is based on 
the output rather than input samples from the past. As a result the pre- 
dictor is in the feedback loop around the quantizer and quantizer noise is 

fed back to the quantizer input at the next step. This prevents accumula- 

* 

tion of errors in the reconstructed signal yk. 


4.5 Two Dimensional DPCM[5] 


The above can be extended to two dimensions if a reasonable causal 
predictor for every pixel in the image is available. Consider, for 
example. 


13 



1 ,J 
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if 83 = 3^82 and is a white noise field then this represents a random 
field with autocorrelation 


r(m,n) = y+n} ' 


The a-j, i =1,2 are the one step correlations of the field along the "i" 
and "j" axes. The prediction error variance is given by 





The DPCM equations corresponding to this model representations are: 


Predictor: 

_* 

^i.j 




_* 

Quantizer Input: 

®i.j 



* 

* 

Reconstructor: 


- U • . + 6 • , 


The mechanization of the above is as shown in Figure 4-3. 

For general imagery, the a.j are on the order of 0.95 from which it 
can be deduced[5] that two dimensional DPCM should require about 3.25 fewer 
bits/pel than PCM. Also, a three- or four-order predictor is usually suf- 
ficient and increasing the order above this does not provide any appre- 
ciable improvement in performance. The predictor coefficients are found by 
minimizing the mean-square prediction error of the input data and leads to 
a set of linear equations which can be solved with some notion of the image 
autocorrelation. The two-dimensional procedure differs from the one-dimen- 
sional case in that it can lead to an unstable causal model. This implies 
that the reconstruction filter can be unstable in the sense that transmis- 
sion errors can be amplified at the receiver. The prediction model has to 
be stabilized by increasing either the prediction error or the order of the 
predictor before it is used in the DPCM algorithm. 
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Figure 4-4. SNR versus rate of DPCM of two-dimensional, separable covariance causal 
model images and its comparison with line-by-line DPCM and with PCM. [5] 



Figure 4-4 shows the performance of several DPCM methods and PCM. 
Typical compression ratios for two dimensional DPCM range (for typical 
8-bit pixel images) from about 3 to 3.5. 

DPCM is a simple easy to implement, on-line compression technique 
which can achieve useful compression ratios. Its three major drawbacks 
are: 

(1) its sensitivity to variation in image statistics, 

(2) its high sensitivity to channel errors, and 

(3) its increase in complexity for other types of data such as 
represented by autoregressive moving average models as opposed to 
autoregressive models only. 

DPCM techniques can be adapted to local variations in image statistics by 
adjusting the number of quantization levels according to local scene 
activity and/or modifying the predictor rule whenever atypical features 
such as steep slopes or edges are encountered. 

4.6 DPCM With Multispectral Predictor 

The previous discussion has indicated how the redundancy in an image, 
both temporally and spatially can be used to effectively realize data 
compression. In the case of multispectral imagery, yet another dimension 
exists which may provide further compression. This dimension is that of 
frequency. What is envisioned is to use the pixel value from one spectral 
band in the role of the predictor for subsequent bands. In this scheme, 
instead of the decoder being the replica of the encoder based on some 
model, the reference band would be transmitted intact using normal PCM and 
used at the receiver for reconstruction of the effected bands. Figure 4-5 
indicates one possible realization of this idea. Notice that the predictor 
error has been eliminated and that, dependent upon the multiplex scheme 
adopted, the strong dependence on channel errors is likely reduced (i.e. , 
channel errors affect the reference and the residuals in a like manner 
resulting in error cancellation for many cases). 
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As shown in Figure 4-5, each band is DPCM encoded by band (i.e., with 
respect to spatial redundancy) prior to spectral domain processing. Based 
on how the redundancy manifests itself this may not be the desired 
approach. Exactly how to implement this interactive encoding strategy 
should be the subject of further study. 
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Figure 4-5. DPCM for mul tispectral data. 









5.0 DYNAMIC COMPANDING 


Histogram analyses of imagery as obtained with Landsat-type multi spec- 
tral sensors routinely show that for a given frame, the full dynamic range 
of pixel amplitude available is seldom required. The histograms are 
generally narrow but with shifting mean value such that a wide range of 
probable values is required to accommodate the set of frames obtained as 
the satellite progresses in orbit. 

Figure 5-1 shows a Landsat image of the North Carolina coastal area. 
Two areas, as defined by the rectangles, have been selected as candidates 
for histogram analysis to demonstrate the above point. The first area is 
inland and represents rural areas including sparse population (and is shown 
enlarged in Figure 5-2). The second provides a good land/water contrast 
comparison (and is shown enlarged in Figure 5-7). Figures 5-3 through 5-6 
and 5-8 through 5-11 show histograms for the first and second areas, 
respectively. Each numbered bin on the histogram represents 2 counts in a 
256 level gray scale. The data, however, only has a possible range of 128 
gray levels (7-bit representation). Notice that for these areas the histo- 
grams are all generally contained in the first fifty or sixty gray levels 
with several being even narrower. This implies that a compression of 2-3 
is possible if the data are encoded, say with a 5-6-bit representation. 

Thus, variable length encoding with the code length and histogram 
position in the 7-bit code inserted into probably the secondary header 
would provide further compression over that achieved with band-to-band DPCM 
as described in the previous section. 

It should be noted that exactly how band-to-band DPCM and dynamic 
companding should and even can work together is not at all clear. 

The approaches have been described independent of one another and no 
detailed examination of interaction has been addressed. This, as described 
in Section 7.0 is an area which should be addressed with further study. 
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Figure 5-1. MSS (band 6) from Landsat 2/North Carolina coastal waters (areas of analysis are 
shown in rectangular areas). 
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Figure 5-2. Analysis area no. 1, rural terrain. 
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Figure 5-3. Histogram of first analysis area, band 4. 
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Figure 5-4. Histogram of first analysis area, band 5. 
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Figure 5-5. Histogram of first analysis area, band 6. 
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Figure 5-8. Histogram of analysis area no. 2, band 4. 
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Figure 5-9. Histogram of analysis area no. 2, band 5. 
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Figure 5-10. Histogram of analysis area no. 2, band 6. 
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6.0 ADDITIONAL TECHNIQUES FOR DATA RATE REDUCTION 


Other, less desirable means of effecting data rate reduction which are 
fairly obvious include swath-width reduction, along-track truncation, and 
pixel averaging. The implementation of any of these should be data 
dependent and therefore adaptive. Swath-width reduction and along-track 
truncation are simply extensions of the FILE algorithm to other scene 
constituents. Pixel averaging, and the degree of averaging, is clearly 
application driven. It is hoped that the use of any of these can be 
avoided by use of DPCM and dynamic companding as described in the previous 
sections. 
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7.0 PACKETIZATION 


The effect of data set selection on the packetization concept is 
minimal. Use of the FILE algorithm as well as dynamic companding force the 
need for some additional information to be located in the packet header: 
namely, the location of the deleted pixels in the case of FILE, and the 
location of max/min levels in the case of dynamic companding. FILE use 
strengthens any argument for variable length packets while dynamic 
companding produces a requirement for variable length words within the data 
packet itself. 
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8.0 RECOMMENDATIONS 


In view of the preceeding discussion it is recommended that studies be 
conducted to: 

(1) Evaluate the potential performance of band-to-band DPCM by direct 
simulation. This would establish whether a temporal -spatial - 
spectral approach is feasible and would provide the tool to 
synthesize the specific technique. 

(2) Evaluate the potential performance improvement available with 
dynamic companding. This again can be achieved with simple 
simulation and the accumulation of imagery gray scale statistics. 

(3) Outline an overall adaptive strategy for utilizing the various 
expression techniques discussed here both collectively and 
selectively to achieve greater data compression in a data-depen- 
dent environment. 

(4) Assess the hardware requirements, power consumption, and through- 
put rates in implementing items (1), (2), and (3) with state-of- 
the-art technology such as VHSIC/VLSI. 
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